Adding Sanitizer interface and implementation by longquanzheng · Pull Request #97 · uber-java/tally

longquanzheng · 2021-04-08T01:36:06Z

This is replacement of #70

Sanitizes metric names and tag keys and values.
Does nothing by default.
The ScopeBulider can be configured with the standard M3Sanitizer.
Fixes Add Metrics Sanitizer #8

CLAassistant · 2021-04-08T01:36:11Z

All committers have signed the CLA.

longquanzheng · 2021-04-08T01:46:00Z

@alexeykudinkin Hi, can you take a look? I have addressed the comments from the previous PR. LMK if you have any others.

longquanzheng · 2021-04-08T16:19:14Z

looks like unit tests are failing. Working on fixing

longquanzheng · 2021-04-10T23:20:11Z

@alexeykudinkin Hi, can you take a look? I have fixed all the tests. Thanks in advance.

longquanzheng · 2021-04-13T23:34:20Z

@alexeykudinkin @andrewmains12 Hello, sorry to bother you guys again. I would really appreciate if you can help review it.

alexeykudinkin · 2021-04-14T05:08:11Z

@longquanzheng we would also need to write a benchmark for this one.

core/src/main/java/com/uber/m3/tally/sanitizers/SanitizerImpl.java

alexeykudinkin · 2021-04-14T05:01:56Z

core/src/main/java/com/uber/m3/tally/sanitizers/StringSanitizer.java

+ * StringSanitizer is to sanitize strings
+ * It has a Sanitize method which returns a sanitized version of the input string value.
+ */
+public interface StringSanitizer {


I don't think we need standalone interface for this, Function interface should be reflective

Yeah agreed. It's too heavy for a function to have interface.
You meant functional interface/lambda, correct?

Changed to Function<String,String>

core/src/main/java/com/uber/m3/tally/sanitizers/SanitizeRange.java

alexeykudinkin · 2021-04-14T05:03:52Z

core/src/main/java/com/uber/m3/tally/sanitizers/SanitizeRange.java

+
+    private SanitizeRange(char low, char high) {
+        this.low = low;
+        this.high = high;


Please add precondition asserting that high >= low

I don't see an example of "Precondition"(maybe I didn't find in a correct way) so I throw a Runtime exception here. LMK if you want to assert in a different way.

Yeah, precondition reference was mostly notional (please do not import Guava for that)

longquanzheng · 2021-04-16T05:41:06Z

@alexeykudinkin thanks so much for your reviews. I have addressed your comments except bench tests. Could you give me more about how to write/add it?

alexeykudinkin · 2021-04-16T21:37:20Z

core/src/main/java/com/uber/m3/tally/ScopeBuilder.java

    protected String separator = DEFAULT_SEPARATOR;
    protected ImmutableMap<String, String> tags;
    protected Buckets defaultBuckets = DEFAULT_SCOPE_BUCKETS;
+    protected ScopeSanitizer sanitizer = new ScopeSanitizerBuilder().build();


Let's make this opt-in -- by default sanitizer should be no-op

Do you mean having it as null by default in the builder?

This would be a little bit tricky. Then the ScopeImpl will have to do this null check everywhere, is that okay? (It would be easier if this is Kotlin and we can use ? operator, sigh)

NVM, looks like I will apply another comment from you and Andrey in #97 (comment)

alexeykudinkin · 2021-04-16T21:38:11Z

@longquanzheng LGTM, minor comment.

Regarding benchmarks, you can take a look at the benchmarks package for some examples;

SokolAndrey · 2021-04-19T04:06:36Z

core/src/main/java/com/uber/m3/tally/ScopeImpl.java

+        }
+
+        ImmutableMap.Builder<String, String> builder = new ImmutableMap.Builder<>();
+        if (tags != null) {


nit: this check is redundant. If tags is null it will fail on tags.entrySet()

I'd rather re-phrase this: @longquanzheng please move this check at the beginning to make sure there's no NPE

yeah good point. Thanks

SokolAndrey · 2021-04-19T04:09:07Z

core/src/main/java/com/uber/m3/tally/sanitizers/CharRange.java

+
+    private CharRange(char low, char high) {
+        if (low > high) {
+            throw new RuntimeException("invalid CharRange");


nit: s/RuntimException/IllegalArgumentException?

Done. Thanks

SokolAndrey · 2021-04-19T04:12:54Z

core/src/main/java/com/uber/m3/tally/sanitizers/SanitizerImpl.java

+    private final Function<String, String> keySanitizer;
+    private final Function<String, String> valueSanitizer;
+
+    SanitizerImpl(Function<String, String> nameSanitizer, Function<String, String> keySanitizer, Function<String, String> valueSanitizer) {


nit: I'd make arguments name more explicit: tagKeySanitizer, tagValueSanitizer

yeah good catch -- I should have done it, hehe

SokolAndrey · 2021-04-19T04:17:32Z

core/src/main/java/com/uber/m3/tally/sanitizers/ScopeSanitizerBuilder.java

+ */
+public class ScopeSanitizerBuilder {
+
+    private Function<String, String> nameSanitizer = value -> value;


nit: consider Function.identity()

Thanks! TIL

SokolAndrey · 2021-04-19T04:19:04Z

core/src/main/java/com/uber/m3/tally/sanitizers/ScopeSanitizerBuilder.java

+
+/**
+ * The SanitizerBuilder returns a Sanitizer for the name, key and value. By
+ * default, the name, key and value sanitize functions returns all the input


Maybe create a NoopSanitizer implements ScopeSanitizer and use it as a default one instead?

Yeah I like this idea better.
Otherwise having null in ScopeImpl will require lots of null checking in the code.

SokolAndrey · 2021-04-19T04:44:43Z

core/src/main/java/com/uber/m3/tally/sanitizers/ScopeSanitizerBuilder.java

@@ -0,0 +1,89 @@
+// Copyright (c) 2020 Uber Technologies, Inc.


pls update copyright to 2021

SokolAndrey · 2021-04-19T04:47:27Z

core/src/test/java/com/uber/m3/tally/sanitizers/SanitizeRangeTest.java

+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+
+public class SanitizeRangeTest {


nit: s/SanitizeRangeTest/CharRangeTest

SokolAndrey · 2021-04-19T04:48:56Z

core/src/test/java/com/uber/m3/tally/sanitizers/SanitizeRangeTest.java

+    private static final char HIGH = 'z';
+
+    @Test
+    public void sanitizeRange() {


pls add a test when HIGH is smaller than LOW and maybe when HIGH == LOW

SokolAndrey · 2021-04-19T04:50:19Z

core/src/test/java/com/uber/m3/tally/sanitizers/SanitizerTest.java

+
+import static org.junit.Assert.assertEquals;
+
+public class SanitizerTest {


nit: s/SanitizerTest/SanitizerImplTest or ScopeSanitizerTest

changed to ScopeSanitizerTest

SokolAndrey · 2021-04-19T04:55:09Z

m3/src/main/java/com/uber/m3/tally/m3/M3Sanitizer.java

+            .withKeyCharacters(
+                ValidCharacters.of(
+                    ValidCharacters.ALPHANUMERIC_RANGE,
+                    ValidCharacters.UNDERSCORE_DASH_CHARACTERS))


iirc dot is also a valid character but not recommended

changed to UNDERSCORE_DASH_CHARACTERS

SokolAndrey

Since you're modifying ScopeImpl default behaviour (even though it's a noop by default) would be nice to see how does it affect the benchmarks.
You can run those benchmark tests using the following command:

./gradlew tally-core:runJmhTests

Could you run it without and with your changes and add both results to the PR?
It would be nice to have separate benchmarks for sanitizer implementation though, but I don't think it has to be a part of this PR. @alexeykudinkin wdyt?

you can find more info/examples on JMH benchmarks here

longquanzheng · 2021-04-20T17:20:11Z

Thanks both for the review. I will address the comments this week.

longquanzheng · 2021-04-22T19:15:58Z

@alexeykudinkin @SokolAndrey thank you so much again for the reviewing.

I have addressed all the comments. Below is the bench results on my laptop:

Before the PR(using master)

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1162.629 ± 67.777  ops/ms

After the PR:

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1146.372 ± 30.429  ops/ms

My second run on current PR:

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1151.460 ± 53.541  ops/ms

alexeykudinkin · 2021-04-23T18:41:11Z

@longquanzheng i assume these runs are w/ no-op sanitizer? Can you also paste Benchmarks w/ the sanitizer actually used?

longquanzheng · 2021-04-23T18:58:13Z

@longquanzheng i assume these runs are w/ no-op sanitizer? Can you also paste Benchmarks w/ the sanitizer actually used?

Yes. Thanks for pointing it out.
I am testing with sanitizer now. Is that the right way? longquanzheng@5f2993c

longquanzheng · 2021-04-23T19:03:21Z

@longquanzheng i assume these runs are w/ no-op sanitizer? Can you also paste Benchmarks w/ the sanitizer actually used?

Yes. Thanks for pointing it out.
I am testing with sanitizer now. Is that the right way? longquanzheng@5f2993c

Just ran with the commit of sanitizer:

Benchmark                                    Mode  Cnt     Score    Error   Units
ScopeImplBenchmark.scopeReportingBenchmark  thrpt   10  1485.831 ± 40.634  ops/ms

alexeykudinkin · 2021-04-23T21:27:56Z

@longquanzheng i don't think we have any Benchmarks before targeting specifically code paths you're changing. Let's make sure that Benchmarks are stressing code paths that are changing.

add benchmark code

longquanzheng · 2021-04-23T21:52:47Z

@longquanzheng i don't think we have any Benchmarks before targeting specifically code paths you're changing. Let's make sure that Benchmarks are stressing code paths that are changing.

I got what you mean. Just added benchmark code. Can you take another look?

longquanzheng · 2021-04-23T22:04:52Z

@alexeykudinkin here is the results:

Benchmark                                                 Mode  Cnt   Score   Error   Units
ScopeImplBenchmark.recordingBenchmark                    thrpt   10  75.436 ± 2.179  ops/ms
ScopeImplBenchmark.recordingWithSanitizingDashBenchmark  thrpt   10  53.200 ± 1.539  ops/ms

recordingBenchmark is the default without sanitizer, recordingWithSanitizingDashBenchmark is with sanitizers

alexeykudinkin · 2021-04-26T23:00:57Z

core/benchmark-tests.txt

@@ -0,0 +1,4 @@
+Benchmark                                                 Mode  Cnt      Score     Error   Units
+ScopeImplBenchmark.recordingBenchmark                    thrpt   10     74.819 ±   4.573  ops/ms


Please check how other benchmarks are reported (we need to capture all data-points offered by JMH)

Sorry, I don't really understand how exactly to do it.

Would you mind taking it over? I am not convinced that this should be done by this PR, as the benchmark is already missing. The most problem is I don't have a clear picture of what you really want for benchmarking(with out detailed documentation), and it should be much easier if your team just do it.

LMK if you agree.

It should be done by this PR. This is a critical library laying in the hot-path of execution of many services, hence we need to maintain the focus on its performance.

Over the last year a lot of efforts have been put in to optimize, streamline, and make this library robust to serve the needs it was originally built for. Therefore, as a new contribution guideline we require any non-trivial change to adhere to the same basic principles of assuring library's correctness and performance.

Totally appreciate the amount of incremental effort that is required to adhere to this heightened standards from every contribution, but unfortunately there's no other way to guarantee high-level of reliability and performance of the open-source library otherwise.

Please check how other benchmarks are reported

Can you give more details here? I did check but don't know how that works(though I see other reports have the details like gc time).
For example another benchmark is just
public class M3ReporterBenchmark extends AbstractReporterBenchmark. So I don't know what am I missing here.

@SokolAndrey can you help in here?

alexeykudinkin · 2021-04-26T23:04:48Z

core/src/jmh/java/com/uber/m3/tally/ScopeImplBenchmark.java

+                    .reportEvery(Duration.MAX_VALUE);
+        }

+        public void recordTestMetrics(final ScopeImpl scope) {


I don't think this is a good routine to check -- this was used as pre-init seq, and i would suggest to keep it as such.

Instead create separate benchmarks and routines to update counter/gauge/histogram separately.

Same as my pervious response. I appreciate your time reviewing and commenting it.
But it would be nice if you can help the benchmarking.

Please check my comment above. We're more than happy to help w/ guidance, but benchmarking is now a standard requirement from non-trivial contributions.

Sure, will revert that one back.

Can you give me details about what you want to benchmark? -- which method/function/routines?

As i have called out prior, let's test creation and recording fro each type of metric individually (counter/gauge/histogram).

// Counter scope.counter("counter").inc(1) // Timer scope.timer("timer").record(...) // Histogram ...

ravirajj and others added 3 commits April 7, 2021 18:02

Adding Sanitizer interface and implementation

0f77624

address comments to avoid copy

afa6c6d

rename and comment

a772d57

longquanzheng added 2 commits April 7, 2021 18:43

fix error

a188eee

rename

fc8f1aa

longquanzheng added 2 commits April 10, 2021 11:35

fix test

edc0a4e

fix style

d8fdac9

alexeykudinkin reviewed Apr 14, 2021

View reviewed changes

longquanzheng added 3 commits April 15, 2021 22:27

adderss comments

45243d8

Use Function interface

a27e6d3

fix style

5aa1dc5

alexeykudinkin reviewed Apr 16, 2021

View reviewed changes

SokolAndrey reviewed Apr 19, 2021

View reviewed changes

SokolAndrey approved these changes Apr 20, 2021

View reviewed changes

longquanzheng added 7 commits April 22, 2021 10:16

address comments

1b941ff

NoopSanitizer

6e4f04a

adress commeents

5328ba1

address comments for validCharacters

5cb9f0f

address comments for unit tests

3e34116

use UNDERSCORE_DASH_CHARACTERS

dde5ff4

fix format

fbbf111

longquanzheng added 2 commits April 22, 2021 11:56

more tests

f1d8f25

more fix for copyright

bb561ec

Bench with sanitizer

315c559

add benchmark code

close at teardown

d0c33f8

longquanzheng added 2 commits April 23, 2021 15:22

add bench results

08ed2d5

fix style

8b09947

alexeykudinkin reviewed Apr 26, 2021

View reviewed changes

alexeykudinkin suggested changes Apr 26, 2021

View reviewed changes

		@@ -0,0 +1,89 @@
		// Copyright (c) 2020 Uber Technologies, Inc.


		import static org.junit.Assert.assertEquals;

		public class SanitizerTest {

		@@ -0,0 +1,4 @@
		Benchmark Mode Cnt Score Error Units
		ScopeImplBenchmark.recordingBenchmark thrpt 10 74.819 ± 4.573 ops/ms

Conversation

longquanzheng commented Apr 8, 2021

Uh oh!

CLAassistant commented Apr 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

longquanzheng commented Apr 8, 2021

Uh oh!

longquanzheng commented Apr 8, 2021

Uh oh!

longquanzheng commented Apr 10, 2021

Uh oh!

longquanzheng commented Apr 13, 2021

Uh oh!

alexeykudinkin commented Apr 14, 2021

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longquanzheng commented Apr 16, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

longquanzheng Apr 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexeykudinkin commented Apr 16, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

CLAassistant commented Apr 8, 2021 •

edited

Loading

longquanzheng Apr 22, 2021 •

edited

Loading

longquanzheng commented Apr 22, 2021 •

edited

Loading

longquanzheng commented Apr 23, 2021 •

edited

Loading

longquanzheng Apr 30, 2021 •

edited

Loading

alexeykudinkin May 1, 2021 •

edited

Loading